Problem Statement :
The Netflix dataset poses challenges related to understanding and leveraging the vast amount of data available from Netflix's streaming platform. The dataset comprises various types of information, such as Director,Cast, Ratings, Genre(Listed_in), Movies and TV shows. The goal is to extract meaningful insights for analyzing the data and generate insights that could help Netflix in deciding which type of shows/movies to produce and how they can grow the business in different countries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv(r"F:\data_set\netflix_dataset.csv")
df.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8807 entries, 0 to 8806 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 show_id 8807 non-null object 1 type 8807 non-null object 2 title 8807 non-null object 3 director 6173 non-null object 4 cast 7982 non-null object 5 country 7976 non-null object 6 date_added 8797 non-null object 7 release_year 8807 non-null int64 8 rating 8803 non-null object 9 duration 8804 non-null object 10 listed_in 8807 non-null object 11 description 8807 non-null object dtypes: int64(1), object(11) memory usage: 825.8+ KB
df.isna().sum().sum()
4307
df.loc[[3]].isna().sum().sum()
3
df.dropna(axis=1, how='all')
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | listed_in | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 8802 | s8803 | Movie | Zodiac | David Fincher | Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... | United States | November 20, 2019 | 2007 | R | 158 min | Cult Movies, Dramas, Thrillers | A political cartoonist, a crime reporter and a... |
| 8803 | s8804 | TV Show | Zombie Dumb | NaN | NaN | NaN | July 1, 2019 | 2018 | TV-Y7 | 2 Seasons | Kids' TV, Korean TV Shows, TV Comedies | While living alone in a spooky town, a young g... |
| 8804 | s8805 | Movie | Zombieland | Ruben Fleischer | Jesse Eisenberg, Woody Harrelson, Emma Stone, ... | United States | November 1, 2019 | 2009 | R | 88 min | Comedies, Horror Movies | Looking to survive in a world taken over by zo... |
| 8805 | s8806 | Movie | Zoom | Peter Hewitt | Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... | United States | January 11, 2020 | 2006 | PG | 88 min | Children & Family Movies, Comedies | Dragged from civilian life, a former superhero... |
| 8806 | s8807 | Movie | Zubaan | Mozez Singh | Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... | India | March 2, 2019 | 2015 | TV-14 | 111 min | Dramas, International Movies, Music & Musicals | A scrappy but poor boy worms his way into a ty... |
8807 rows × 12 columns
df.dropna(axis=1, how='any')
| show_id | type | title | release_year | listed_in | description | |
|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | 2020 | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | 2021 | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | 2021 | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | 2021 | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | 2021 | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
| ... | ... | ... | ... | ... | ... | ... |
| 8802 | s8803 | Movie | Zodiac | 2007 | Cult Movies, Dramas, Thrillers | A political cartoonist, a crime reporter and a... |
| 8803 | s8804 | TV Show | Zombie Dumb | 2018 | Kids' TV, Korean TV Shows, TV Comedies | While living alone in a spooky town, a young g... |
| 8804 | s8805 | Movie | Zombieland | 2009 | Comedies, Horror Movies | Looking to survive in a world taken over by zo... |
| 8805 | s8806 | Movie | Zoom | 2006 | Children & Family Movies, Comedies | Dragged from civilian life, a former superhero... |
| 8806 | s8807 | Movie | Zubaan | 2015 | Dramas, International Movies, Music & Musicals | A scrappy but poor boy worms his way into a ty... |
8807 rows × 6 columns
def fun(df):
return df.fillna(1, inplace=True)
df.shape
(8807, 12)
df.rename(columns={"listed_in": "genre"},inplace=True)
df.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | genre | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | s1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | s2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | s3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | s4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | s5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
df["show_id"]=df["show_id"].apply(lambda x: x.replace("s",""))
df.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | genre | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 min | Documentaries | As her father nears the end of his life, filmm... |
| 1 | 2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | 3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | 4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | 5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
df.describe(include="object").T
| count | unique | top | freq | |
|---|---|---|---|---|
| show_id | 8807 | 8807 | 1 | 1 |
| type | 8807 | 2 | Movie | 6131 |
| title | 8807 | 8807 | Dick Johnson Is Dead | 1 |
| director | 6173 | 4528 | Rajiv Chilaka | 19 |
| cast | 7982 | 7692 | David Attenborough | 19 |
| country | 7976 | 748 | United States | 2818 |
| date_added | 8797 | 1767 | January 1, 2020 | 109 |
| rating | 8803 | 17 | TV-MA | 3207 |
| duration | 8804 | 220 | 1 Season | 1793 |
| genre | 8807 | 514 | Dramas, International Movies | 362 |
| description | 8807 | 8775 | Paranormal activity at a lush, abandoned prope... | 4 |
plt.title("Percentage of Movies vs TV Shows available on Netflix as of 2021")
df['type'].value_counts().plot(kind = "pie", autopct="%.2f")
circle = plt.Circle((0, 0), 0.38, color='white')
plt.gcf().gca().add_artist(circle)
plt.show()
df["director"].value_counts().head(10)
Rajiv Chilaka 19 Raúl Campos, Jan Suter 18 Marcus Raboy 16 Suhas Kadav 16 Jay Karas 14 Cathy Garcia-Molina 13 Martin Scorsese 12 Youssef Chahine 12 Jay Chapman 12 Steven Spielberg 11 Name: director, dtype: int64
df_dir=df[["title","director"]]
df_dir["director"]=df_dir["director"].str.split(", ")
df_dir= df_dir.explode("director")
df_dir["director"].value_counts().head(10)
Rajiv Chilaka 22 Jan Suter 21 Raúl Campos 19 Suhas Kadav 16 Marcus Raboy 16 Jay Karas 15 Cathy Garcia-Molina 13 Jay Chapman 12 Youssef Chahine 12 Martin Scorsese 12 Name: director, dtype: int64
sns.set_theme(style="whitegrid")
sns.set(rc={"axes.facecolor":"lightgrey"})
plt.figure(figsize= (10,6))
sns.countplot(palette="icefire",y=df_dir["director"],order=pd.value_counts(df_dir["director"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 director ")
plt.show()
df_actor=df[["title","cast"]]
df_actor["cast"]=df_actor["cast"].str.split(",")
df_actor= df_actor.explode("cast")
sns.set_theme(style="whitegrid")
sns.set(rc={"axes.facecolor":"lightgrey"})
plt.figure(figsize= (10,6))
sns.countplot(palette="mako",y=df_actor["cast"],order=pd.value_counts(df_actor["cast"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 Actors ")
plt.show()
df_country=df[["country"]]
df_country["country"]=df_country["country"].str.split(", ")
df_country= df_country.explode("country")
df_country["country"].value_counts().head(10)
United States 3689 India 1046 United Kingdom 804 Canada 445 France 393 Japan 318 Spain 232 South Korea 231 Germany 226 Mexico 169 Name: country, dtype: int64
sns.set_theme(style="whitegrid")
sns.set(rc={"axes.facecolor":"lightgrey"})
plt.figure(figsize= (10,6))
sns.countplot(palette="mako",x=df["country"],hue=df["type"],order=pd.value_counts(df_country["country"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 Country Based Number Content")
plt.show()
total_count = df["title"].value_counts().sum()
top_10 = df_country["country"].value_counts().head(10).sum()
print(round((top_10/total_count)*100,2),"%")
85.76 %
total_count = df["title"].value_counts().sum()
top_10 = df_country["country"].value_counts().head(5).sum()
print(round((top_10/total_count)*100,2),"%")
72.41 %
total_count = df["title"].value_counts().sum()
top_10 = df_country["country"].value_counts().head(1).sum()
print(round((top_10/total_count)*100,2),"%")
41.89 %
df_genre=df[["genre"]]
df_genre["genre"]=df_genre["genre"].str.split(", ")
df_genre= df_genre.explode("genre")
df_genre["genre"].value_counts().head(10)
International Movies 2752 Dramas 2427 Comedies 1674 International TV Shows 1351 Documentaries 869 Action & Adventure 859 TV Dramas 763 Independent Movies 756 Children & Family Movies 641 Romantic Movies 616 Name: genre, dtype: int64
sns.set_theme(style="whitegrid")
sns.set(rc={"axes.facecolor":"lightgrey"})
sns.countplot(palette="magma",y=df_genre["genre"],order=pd.value_counts(df_genre["genre"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 genre")
plt.show()
df["release_year"].value_counts().head(10)
2018 1147 2017 1032 2019 1030 2020 953 2016 902 2021 592 2015 560 2014 352 2013 288 2012 237 Name: release_year, dtype: int64
year_wise =pd.value_counts(df["release_year"]).reset_index().head(30)
year_wise.rename(columns={"index":"year_of_release","release_year":"No_of_movies_and_tv_shows"},inplace=True)
sns.set_theme(style="whitegrid")
sns.set(rc={"axes.facecolor":"lightgrey"})
plt.figure(figsize=(10,8))
sns.barplot(data=year_wise,x="year_of_release",y="No_of_movies_and_tv_shows")
plt.xticks(rotation=90 ,fontsize=10)
plt.title("Movies and T.V shows Released Year Wise")
plt.show()
year_wise = pd.value_counts(df["release_year"]).reset_index().head(30)
year_wise.rename(columns={"index":"Year of release","release_year":"No. of titles"}, inplace = True)
df_mov_tv_rel = df[['type', 'release_year']]
df_mov_tv_rel = df_mov_tv_rel.value_counts().reset_index()
df_mov_tv_rel.rename(columns = {0: 'count'}, inplace = True)
df_mov_tv_rel = df_mov_tv_rel[df_mov_tv_rel["release_year"]>=1992]
plt.figure(figsize = (10, 6))
a = sns.lineplot(data=df_mov_tv_rel, x="release_year", y="count", legend='auto', palette = 'viridis',hue ='type',linewidth = 2, marker = 'o')
for i, j in zip(df_mov_tv_rel["release_year"], df_mov_tv_rel["count"]):
a.text(i, j, str(j), fontsize = 10)
plt.title("Year wise count of content release (1992 - 2021)")
a.set_ylabel('Release count')
a.set_xlabel('Release Year')
plt.show()
Month = df[["date_added"]].replace(np.nan,"No_date")
Month["Release_month"] = Month["date_added"].apply(lambda x:x.lstrip().split(" ")[0])
Month["Release_month"].value_counts().head(12)
July 827 December 813 September 770 April 764 October 760 August 755 March 742 January 738 June 728 November 705 May 632 February 563 Name: Release_month, dtype: int64
sns.set_theme(style="whitegrid")
sns.set(rc={"axes.facecolor":"lightgrey"})
sns.countplot(palette="rainbow",y= Month["Release_month"],order=pd.value_counts(Month["Release_month"]).head(12).index)
plt.xticks(fontsize=10)
plt.title("Month wise Content added")
plt.show()
sns.set_theme(style="whitegrid")
plt.figure(figsize=(10,8))
sns.set(rc={"axes.facecolor":"lightgrey"})
sns.countplot(palette="rainbow",x= Month["Release_month"],hue=df["type"],order=pd.value_counts(Month["Release_month"]).head(12).index)
plt.xticks(rotation = 45,fontsize=10)
plt.title("Month Wise Content added")
plt.show()
df["rating"].value_counts()[:-3]
TV-MA 3207 TV-14 2160 TV-PG 863 R 799 PG-13 490 TV-Y7 334 TV-Y 307 PG 287 TV-G 220 NR 80 G 41 TV-Y7-FV 6 NC-17 3 UR 3 Name: rating, dtype: int64
sns.set_theme(style="whitegrid")
plt.figure(figsize=(10,8))
sns.countplot(palette="magma",x=df["rating"],hue=df["type"],order=pd.value_counts(df["rating"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top Rating Based on No. Of Content")
plt.show()
movies_df=df.loc[(df["type"]=="Movie")]
movies_df.reset_index()
movies_df.columns.name=None
df_movie_dir=movies_df[["director"]]
df_movie_dir["director"]=df_movie_dir["director"].str.split(", ")
df_movie_dir= df_movie_dir.explode("director")
sns.set_theme(style="whitegrid")
sns.countplot(palette="viridis",y=df_movie_dir["director"],order=pd.value_counts(df_movie_dir["director"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 Movies Directors")
plt.show()
df_movie_cast=movies_df[["cast"]]
df_movie_cast["cast"]=df_movie_cast["cast"].str.split(", ")
df_movie_cast= df_movie_cast.explode("cast")
sns.set_theme(style="whitegrid")
sns.countplot(palette="rainbow",y=df_movie_cast["cast"],order=pd.value_counts(df_movie_cast["cast"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 Movies Actors")
plt.show()
movies_df['duration'] = movies_df['duration'].astype(str).str.replace(' min', '')
movies_df.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | genre | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Movie | Dick Johnson Is Dead | Kirsten Johnson | NaN | United States | September 25, 2021 | 2020 | PG-13 | 90 | Documentaries | As her father nears the end of his life, filmm... |
| 6 | 7 | Movie | My Little Pony: A New Generation | Robert Cullen, José Luis Ucha | Vanessa Hudgens, Kimiko Glenn, James Marsden, ... | NaN | September 24, 2021 | 2021 | PG | 91 | Children & Family Movies | Equestria's divided. But a bright-eyed hero be... |
| 7 | 8 | Movie | Sankofa | Haile Gerima | Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D... | United States, Ghana, Burkina Faso, United Kin... | September 24, 2021 | 1993 | TV-MA | 125 | Dramas, Independent Movies, International Movies | On a photo shoot in Ghana, an American model s... |
| 9 | 10 | Movie | The Starling | Theodore Melfi | Melissa McCarthy, Chris O'Dowd, Kevin Kline, T... | United States | September 24, 2021 | 2021 | PG-13 | 104 | Comedies, Dramas | A woman adjusting to life after a loss contend... |
| 12 | 13 | Movie | Je Suis Karl | Christian Schwochow | Luna Wedler, Jannis Niewöhner, Milan Peschel, ... | Germany, Czech Republic | September 23, 2021 | 2021 | TV-MA | 127 | Dramas, International Movies | After most of her family is murdered in a terr... |
movies_df.dropna(subset=["duration"],axis=0,inplace=True)
movies_df["duration"]=movies_df["duration"].astype(float)
movies_df["duration"].describe()
count 6128.000000 mean 99.577187 std 28.290593 min 3.000000 25% 87.000000 50% 98.000000 75% 114.000000 max 312.000000 Name: duration, dtype: float64
sns.boxplot(movies_df['duration'])
plt.show()
df_movie_genre=movies_df[["genre"]]
df_movie_genre["genre"]=df_movie_genre["genre"].str.split(", ")
df_movie_genre= df_movie_genre.explode("genre")
sns.set_theme(style="whitegrid")
sns.countplot(palette="rainbow",y=df_movie_genre["genre"],order=pd.value_counts(df_movie_genre["genre"]).head(10).index)
plt.xticks( fontsize=10)
plt.title("Top 10 Movies Genre")
plt.show()
tv_show_df=df.loc[(df["type"]=="TV Show")]
tv_show_df.reset_index()
tv_show_df.columns.name=None
tv_show_df.head(5)
| show_id | type | title | director | cast | country | date_added | release_year | rating | duration | genre | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | TV Show | Blood & Water | NaN | Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... | South Africa | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, TV Dramas, TV Mysteries | After crossing paths at a party, a Cape Town t... |
| 2 | 3 | TV Show | Ganglands | Julien Leclercq | Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Crime TV Shows, International TV Shows, TV Act... | To protect his family from a powerful drug lor... |
| 3 | 4 | TV Show | Jailbirds New Orleans | NaN | NaN | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | Docuseries, Reality TV | Feuds, flirtations and toilet talk go down amo... |
| 4 | 5 | TV Show | Kota Factory | NaN | Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... | India | September 24, 2021 | 2021 | TV-MA | 2 Seasons | International TV Shows, Romantic TV Shows, TV ... | In a city of coaching centers known to train I... |
| 5 | 6 | TV Show | Midnight Mass | Mike Flanagan | Kate Siegel, Zach Gilford, Hamish Linklater, H... | NaN | September 24, 2021 | 2021 | TV-MA | 1 Season | TV Dramas, TV Horror, TV Mysteries | The arrival of a charismatic young priest brin... |
tv_show_df['duration'].astype(str)
1 2 Seasons
2 1 Season
3 1 Season
4 2 Seasons
5 1 Season
...
8795 2 Seasons
8796 2 Seasons
8797 3 Seasons
8800 1 Season
8803 2 Seasons
Name: duration, Length: 2676, dtype: object
sns.set_theme(style="whitegrid")
ax=sns.countplot(palette="magma",x=tv_show_df['duration'],order=pd.value_counts(tv_show_df['duration']).index)
plt.xticks(rotation = 90)
for p in ax.patches:
ax.annotate(str(p.get_height()), (p.get_x() * 1.005, (p.get_height() * 1.005)))
plt.title("TV Shows duration ")
plt.show()
df_tv_dir=tv_show_df[["director"]]
df_tv_dir["director"]=df_tv_dir["director"].str.split(", ")
df_tv_dir= df_tv_dir.explode("director")
sns.set_theme(style="whitegrid")
sns.countplot(palette="rocket",y=df_tv_dir["director"],order=pd.value_counts(df_tv_dir["director"]).head(10).index)
plt.xticks(rotation=90, fontsize=10)
plt.title("Top 10 TV Show Directors")
plt.show()
df_tv_cast=tv_show_df[["cast"]]
df_tv_cast["cast"]=df_tv_cast["cast"].str.split(", ")
df_tv_cast= df_tv_cast.explode("cast")
sns.set_theme(style="whitegrid")
sns.countplot(palette="mako",y=df_tv_cast["cast"],order=pd.value_counts(df_tv_cast["cast"]).head(10).index)
plt.xticks(rotation=90, fontsize=10)
plt.title("Top 10 TV Show Actors")
plt.show()
df_tv_genre=tv_show_df[["genre"]]
df_tv_genre["genre"]=df_tv_genre["genre"].str.split(", ")
df_tv_genre= df_tv_genre.explode("genre")
sns.set_theme(style="whitegrid")
sns.countplot(palette="mako",y=df_tv_genre["genre"],order=pd.value_counts(df_tv_genre["genre"]).head(10).index)
plt.xticks(rotation=90, fontsize=10)
plt.title("Top 10 TV Show Genre")
plt.show()
Date = df[['date_added']].dropna()
Date['date'] = Date['date_added'].apply(lambda x : x.lstrip().split(' ')[1].replace(",",""))
Date['months'] = Date['date_added'].apply(lambda x : x.lstrip().split(' ')[0])
Date['years'] = Date['date_added'].apply(lambda x : x.split(', ')[-1])
Order_Of_Month = ['January', 'February', 'March', 'April', 'May','June', 'July', 'August', 'September', 'October', 'November','December'][::-1]
df8 = Date.groupby('years')['months'].value_counts().unstack().fillna(0)[Order_Of_Month].T[::-1]
Date['date'].value_counts().sort_values(ascending=False).reset_index().head()
| index | date | |
|---|---|---|
| 0 | 1 | 2212 |
| 1 | 15 | 687 |
| 2 | 2 | 325 |
| 3 | 16 | 289 |
| 4 | 31 | 274 |
perc_1 = (Date['date'].value_counts()[30]/Date['date'].value_counts().sum())*100
print((round(perc_1,2)),"%")
1.6 %
plt.figure(figsize=(8,6))
sns.distplot(Date['date'])
plt.title('KDE plot for the day wise Content added')
plt.show()
Date = df[['date_added']].dropna()
Date['months'] = Date['date_added'].apply(lambda x : x.lstrip().split(' ')[0])
Date['years'] = Date['date_added'].apply(lambda x : x.split(', ')[-1])
Order_Of_Month = ['January', 'February', 'March', 'April', 'May','June', 'July', 'August', 'September', 'October', 'November','December'][::-1]
df8 = Date.groupby('years')['months'].value_counts().unstack().fillna(0)[Order_Of_Month].T[::-1]
plt.figure(figsize=(17,8))
sns.heatmap(data=df8,annot=True,linewidth=0.5,fmt=".0f",cmap="crest")
plt.title('Frequency Of Content Updated on Netflix', fontsize=20,fontfamily='Arial', fontweight='bold')
plt.show()
df.dropna(subset=["duration", "rating", "date_added"], inplace = True)
round((df.isnull().sum()/df.shape[0]*100),2).sort_values(ascending = False)
director 29.82 country 9.43 cast 9.39 show_id 0.00 type 0.00 title 0.00 date_added 0.00 release_year 0.00 rating 0.00 duration 0.00 genre 0.00 description 0.00 dtype: float64
# Replacing NaN values in cast column with "No Cast"
df["cast"].replace(np.NaN, "No Cast", inplace = True)
# Replacing NaN values in country column with "Unknown"
df["country"].replace(np.NaN, "Unknown", inplace = True)
round((df.isna().sum()/df.shape[0]*100),2).sort_values(ascending = False)
director 29.82 show_id 0.00 type 0.00 title 0.00 cast 0.00 country 0.00 date_added 0.00 release_year 0.00 rating 0.00 duration 0.00 genre 0.00 description 0.00 dtype: float64